An In-Depth Analysis of the Social Media Platform Trell#
Introduction#
Welcome to our data story project, where we embark on an exciting journey to explore the depths of the social media platform Trell. Through a series of interlinked visualizations and explanatory text, we aim to unravel the intricate relationships within Trell’s user data and shed light on the factors influencing user engagement.
Trell, a popular social media platform, offers users a unique space to discover, create, and share their experiences through captivating visual content. In this project, we dive into a comprehensive dataset that encompasses a wide range of attributes related to Trell’s users. From user demographics and activity patterns to engagement metrics and content preferences, our dataset provides a rich foundation for uncovering fascinating insights.
Before we delve into the analysis, we diligently preprocess the dataset to ensure data quality and relevance. Cleaning the dataset, handling missing values, and transforming variables where necessary form the crucial groundwork for our exploration. By employing best practices in data preprocessing, we ensure that our subsequent analyses and visualizations are accurate and informative.
Throughout the project, we actively seek feedback from our Teaching Assistant (TA) and peers, recognizing the value of diverse perspectives in refining our analysis and improving the clarity of our visualizations. This iterative process enables us to present a compelling data story that effectively communicates the insights derived from the Trell dataset.
Join us on this captivating journey as we uncover the correlations between various attributes within Trell and unravel the secrets behind user engagement patterns. Through the fusion of data, visualizations, and explanatory text, we hope to empower researchers, marketers, and enthusiasts with a deeper understanding of the dynamic landscape of Trell.
# Imports
import pandas as pd
from scipy.stats import pearsonr
import plotly.graph_objects as go
import plotly.express as px
import plotly.offline as pyo
import numpy as np
pyo.init_notebook_mode()
Our perspectives#
Perspective 1: Content Creator on Trell#
As a content creator on Trell, you play a vital role in shaping the platform’s landscape and engaging with its user base. Through this perspective, we aim to provide insights into the factors that contribute to your success and help you optimize your content creation strategy.
By analyzing the dataset, we explore the correlation between various attributes and the content creator’s performance on Trell. We investigate factors such as user activity, the age groups of users, and audience sizes to understand their impact on content reach and engagement. Through visualizations and data-driven analysis, we aim to empower content creators with actionable insights to enhance their content’s visibility and impact.
Argument #1: A content creator shouldn’t upload at night.
The best time to upload a video would be during the day, with increasing effectiveness towards the evening. Even though the differences between the different parts of the day are less than 2%, it appears in the pie chart that it is a good idea to not upload during the night, as it won’t show up to a lot of people their feed.
Show code cell source
# Graph 1
data = pd.read_csv('train_age_dataset.csv')
slot1_sum = int(data['slot1_trails_watched_per_day'].sum())
slot2_sum = int(data['slot2_trails_watched_per_day'].sum())
slot3_sum = int(data['slot3_trails_watched_per_day'].sum())
slot4_sum = int(data['slot4_trails_watched_per_day'].sum())
slot_sums = [slot1_sum, slot2_sum, slot3_sum, slot4_sum]
slots = ['00:00-05:59', '06:00-11:59', '12:00-17:59', '18:00-23:59']
fig = go.Figure(data=[go.Pie(labels=slots, values=slot_sums)])
fig.update_layout(
title='Videos watched per time slot',
height=500
)
fig.show()
Graph 1: The different parts of the pie chart represent 6-hour intervals during the day. It implies that users don’t really watch videos during the night.
Argument #2: We should make videos aimed at a young audience.
To maximize views and channel growth a content creator should focus on creating content for users under 18 because the younger audience is by far the biggest one. Even if the 18+ audience is compared to the <18 audience it is still smaller than the younger audience.
Show code cell source
# Graph 2
# Read the data frosm CSV
data = pd.read_csv('train_age_dataset.csv')
# Map the age group values to the corresponding labels
age_labels = {
1: '<18',
2: '18-24',
3: '24-30',
4: '>30'
}
data['age_group'] = data['age_group'].map(age_labels)
data['age_group'] = pd.Categorical(data['age_group'], categories=age_labels.values(), ordered=True)
# Group the data by age group and calculate the mean of videos watched
grouped_data = data.groupby('age_group')['content_views'].count().reset_index()
# Sort the grouped data by age group
grouped_data = grouped_data.sort_values('age_group')
# Create lists for age groups and total videos watched
age_groups = grouped_data['age_group'].tolist()
user_amount = grouped_data['content_views'].tolist()
# Create the Plotly bar chart
fig = go.Figure(data=[go.Bar(x=age_groups, y=user_amount)])
# Update the layout
fig.update_layout(
xaxis_title='Age',
yaxis_title='Amount of users',
title='Amount of users per age group',
height=500
)
# Display the plot
fig.show()
Graph 2: This bar chart has the number of users on the y-axis. It shows that there are a lot more users below 18 compared to other age groups.
Argument #3: Viewer retention is lower for younger audiences.
The boxplot shows that on average the video completion rate of younger audiences is lower. By making a video more engaging throughout a video, viewer retention for younger audiences can be increased.
On the other hand, it might also be a good idea to try out different video lengths. As this could factor in the attention span of different audiences.
Show code cell source
# Graph 3
# Load the data from the CSV file
df = pd.read_csv('train_age_dataset.csv')
# Calculate the lower and upper bounds for outliers using Tukey's fences
Q1 = np.percentile(df['avgCompletion'], 25)
Q3 = np.percentile(df['avgCompletion'], 75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
# Filter out the outliers
df_filtered = df[(df['avgCompletion'] >= lower_bound) & (df['avgCompletion'] <= upper_bound)].copy()
filtered_age_groups = [1, 4]
df_filtered_subset = df_filtered[df_filtered['age_group'].isin(filtered_age_groups)]
# Map gender codes to labels
age_labels = {
1: '<18',
4: '>30'
}
df_filtered_subset.loc[:, 'age_group'] = df_filtered_subset['age_group'].map(age_labels)
# Create the boxplot
fig = px.box(df_filtered_subset, x='age_group', y='avgCompletion', color='age_group',
labels={'age_group': 'Age', 'avgCompletion': 'Average completion'},
title='Average completion of a video by age group')
# Set the width and height of the figure
fig.update_layout(height=500)
# Show the boxplot
fig.show()
Graph 3: It is a boxplot using Tukey Fences to remove outliers, it implies that older users have a longer attention span.
Perspective 2: Trell’s Perspective#
From Trell’s standpoint, understanding user behavior and preferences is crucial for increasing the platform’s ability to produce a profit. Through this perspective, we delve into the dataset to uncover valuable insights that can inform strategic decisions and shape Trell’s future development.
We examine the correlations between attributes such as user demographics, content viewing patterns, and engagement metrics to gain a comprehensive understanding of Trell’s user base. By analyzing trends related to video uploads, video completion rate, and different audiences by category, we aim to provide Trell with valuable insights to optimize the user experience and drive platform growth.
Argument #1: We should increase the incentive for male content creators to upload more videos.
By diversifying our content feed we can attract more advertisers towards Trell, this will be beneficial to both our content creators and Trell. It gives more opportunities for advertisers to target the male audience when buying advertisements on Trell.
Show code cell source
# Graph 4
data = pd.read_csv('train_age_dataset.csv')
# Define the age group labels
age_labels = {
1: '<18',
2: '18-24',
3: '24-30',
4: '>30'
}
# Map the age group labels to the age_group column
data['age_group'] = data['age_group'].map(age_labels)
data['age_group'] = pd.Categorical(data['age_group'], categories=age_labels.values(), ordered=True)
# Group the data by age group and gender and calculate the average videos uploaded per person
grouped_data = data.groupby(['age_group', 'gender'])['creations'].mean().reset_index()
# Separate data for each gender
male_data = grouped_data[grouped_data['gender'] == 1]
female_data = grouped_data[grouped_data['gender'] == 2]
# Create bar traces for male and female genders
male_trace = go.Bar(
x=male_data['age_group'],
y=male_data['creations'],
name='Male',
visible=True # Set initial visibility to False
)
female_trace = go.Bar(
x=female_data['age_group'],
y=female_data['creations'],
name='Female',
visible=False, # Set initial visibility to True
marker=dict(color='red')
)
# Create the layout
layout = go.Layout(
title='Average videos uploaded by gender and age',
xaxis=dict(title='Age'),
yaxis=dict(title='Average videos uploaded'),
height=500
)
# Create the figure and add the traces
fig = go.Figure(data=[male_trace, female_trace], layout=layout)
# Create dropdown menu buttons
buttons = [
dict(
args=[
{'visible': [True, True]},
{'yaxis': {'range': [0, 0.064]}}
], # Show both traces
label='Both',
method='update'
),
dict(
args=[
{'visible': [True, False]},
{'yaxis': {'range': [0, 0.064]}}
], # Show only male trace
label='Male',
method='update'
),
dict(
args=[
{'visible': [False, True]},
{'yaxis': {'range': [0, 0.064]}}
], # Show only female trace
label='Female',
method='update'
)
]
# Create the updatemenus property
updatemenus = [
dict(
buttons=buttons,
direction='down',
pad={'r': 10, 't': 10},
showactive=True,
x=0.9,
xanchor='left',
y=1.2,
yanchor='top'
)
]
# Update the figure layout with updatemenus
fig.update_layout(updatemenus=updatemenus)
# Add annotation
fig.update_layout(
annotations=[
dict(
text='',
showarrow=False,
x=0,
y=1.085,
yref='paper',
align='left'
)
]
)
# Set the 'Both' trace as the initial visible trace
fig.update_traces(visible=True, selector=dict(name='Female'))
# Show the figure
fig.show()
Graph 4: This bar chart has the average amount of daily video uploads on the y-axis. It shows that male content creators above the age of 18 are less likely to upload videos. It also shows that younger people are also less likely to upload videos.
Argument #2: We should separate short and long content.
Short content viewers watch shorter videos and tend to have a higher completion rate and watch more videos. In return, they spend more time on Trell.
Long content viewers tend to watch fewer videos but watch longer videos, however, they also tend to spend less time on Trell and have a lower completion rate.
These types of viewers can be seen on YouTube, for example, the split between YouTube shorts and normal YouTube videos. It shows that on Trell there is also a split in users who prefer long and short content. By properly targeting the right audience Trell can also tap into this market.
Show code cell source
# Graph 5
data = pd.read_csv('train_age_dataset.csv')
content_views_categories = pd.qcut(data['content_views'], q=3, labels=['Low', 'Medium', 'High'])
avgCompletion_categories = pd.qcut(data['avgCompletion'], q=3, labels=['Low', 'Medium', 'High'])
avgTimeSpent_categories = pd.qcut(data['avgTimeSpent'], q=3, labels=['Low', 'Medium', 'High'])
avgDuration_categories = pd.qcut(data['avgDuration'], q=3, labels=['Low', 'Medium', 'High'])
colors = {
'Low': '#b0c4de',
'Medium': '#3cb371',
'High': '#e9967a'
}
fig = go.Figure(data=go.Parcats(
dimensions=[
{'label': 'Average videos watched', 'values': content_views_categories, 'categoryorder': 'array', 'categoryarray': ['High', 'Medium', 'Low']},
{'label': 'Completion rate', 'values': avgCompletion_categories, 'categoryorder': 'array', 'categoryarray': ['High', 'Medium', 'Low']},
{'label': 'Average duration watched videos', 'values': avgDuration_categories, 'categoryorder': 'array', 'categoryarray': ['High', 'Medium', 'Low']},
{'label': 'Average time spent', 'values': avgTimeSpent_categories, 'categoryorder': 'array', 'categoryarray': ['High', 'Medium', 'Low']}
],
line={
'color': avgCompletion_categories.cat.codes,
'colorscale': [[0, '#b0c4de'], [0.5, '#3cb371'], [1, '#e9967a']]
}
))
fig.update_layout(title='Metrics concerning user engagement', height = 500)
fig.show()